81 research outputs found

    Automated extraction of chemical structure information from digital raster images

    Get PDF
    Background: To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated. Results: This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader -- a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns. Conclusion: The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/90875/1/Saitou8.pd

    Identification of correlated genetic variants jointly associated with rheumatoid arthritis using ridge regression

    Full text link
    Abstract Using the North American Rheumatoid Arthritis Consortium genome-wide association dataset, we applied ridged, multiple least-squares regression to identify genetic variants with apparent unique contributions to variation of anti-cyclic citrullinated peptide (anti-CCP), a newly identified clinical risk factor for development of rheumatoid arthritis. Within a 2.7-Mbp region on chromosome 6 around the well studied HLA-DRB1 locus, ridge regression identified a single-nucleotide polymorphism that was associated with anti-CCP variation when including the additive effects of other single-nucleotide polymorphisms in a multivariable analysis, but that showed only a weak direct association with anti-CCP. This suggests that multivariable methods can be used to identify potentially relevant genetic variants in regions of interest that would be difficult to detect based on direct associations.http://deepblue.lib.umich.edu/bitstream/2027.42/117369/1/12919_2009_Article_2814.pd

    A Rational Approach to Personalized Anticancer Therapy: Chemoinformatic Analysis Reveals Mechanistic Gene-Drug Associations

    Full text link
    Purpose . To predict the response of cells to chemotherapeutic agents based on gene expression profiles, we performed a chemoinformatic study of a set of standard anticancer agents assayed for activity against a panel of 60 human tumor-derived cell lines from the Developmental Therapeutics Program (DTP) at the National Cancer Institute (NCI).Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/41497/1/11095_2004_Article_465512.pd

    Classifications of ovarian cancer tissues by proteomic patterns

    Full text link
    Ovarian cancer is a morphologically and biologically heterogeneous disease. The identification of type-specific protein markers for ovarian cancer would provide the basis for more tailored treatments, as well as clues for understanding the molecular mechanisms governing cancer progression. In the present study, we used a novel approach to classify 24 14ovarian cancer tissue samples based on the proteomic pattern of each sample. The method involved fractionation according to p I using chromatofocusing with analytical columns in the first dimension followed by separation of the proteins in each p I fraction using nonporous RP 14HPLC, which was coupled to an ESI-TOF mass analyzer for molecular weight 14(MW) analysis. A 2-D mass map of the protein content of each type of ovarian cancer tissue samples based upon p I versus intact protein MW was generated. Using this method, the clear cell and serous ovarian carcinoma samples were histologically distinguished by principal component analysis and clustering analysis based on their protein expression profiles and subtype-specific biomarker candidates of ovarian cancers were identified, which could be further investigated for future clinical study.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/55853/1/5846_ftp.pd

    Ontology-Based Combinatorial Comparative Analysis of Adverse Events Associated with Killed and Live Influenza Vaccines

    Get PDF
    Vaccine adverse events (VAEs) are adverse bodily changes occurring after vaccination. Understanding the adverse event (AE) profiles is a crucial step to identify serious AEs. Two different types of seasonal influenza vaccines have been used on the market: trivalent (killed) inactivated influenza vaccine (TIV) and trivalent live attenuated influenza vaccine (LAIV). Different adverse event profiles induced by these two groups of seasonal influenza vaccines were studied based on the data drawn from the CDC Vaccine Adverse Event Report System (VAERS). Extracted from VAERS were 37,621 AE reports for four TIVs (Afluria, Fluarix, Fluvirin, and Fluzone) and 3,707 AE reports for the only LAIV (FluMist). The AE report data were analyzed by a novel combinatorial, ontology-based detection of AE method (CODAE). CODAE detects AEs using Proportional Reporting Ratio (PRR), Chi-square significance test, and base level filtration, and groups identified AEs by ontology-based hierarchical classification. In total, 48 TIV-enriched and 68 LAIV-enriched AEs were identified (PRR.2, Chi-square score .4, and the number of cases .0.2% of total reports). These AE terms were classified using the Ontology of Adverse Events (OAE), MedDRA, and SNOMED-CT. The OAE method provided better classification results than the two other methods. Thirteen out of 48 TIV-enriched AEs were related to neurological and muscular processing such as paralysis, movement disorders, and muscular weakness. In contrast, 15 out of 68 LAIV-enriched AEs were associated with inflammatory response and respiratory system disorders. There were evidences of two severe adverse events (Guillain-Barre Syndrome and paralysis) present in TIV. Although these severe adverse events were at low incidence rate, they were found to be more significantly enriched in TIVvaccinated patients than LAIV-vaccinated patients. Therefore, our novel combinatorial bioinformatics analysis discovered that LAIV had lower chance of inducing these two severe adverse events than TIV. In addition, our meta-analysis found that all previously reported positive correlation between GBS and influenza vaccine immunization were based on trivalent influenza vaccines instead of monovalent influenza vaccines.This work was supported by the National Institutes of Health (NIH) grant U54 DA021519 for the National Center for Integrative Biomedical Informatics and NIH National Institute of Allergy and Infectious Diseases (NIAID) grant R01AI081062. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/99110/1/journal.pone.0049941.pd

    Identification of genes associated with complex traits by testing the genetic dissimilarity between individuals

    Get PDF
    Using the exome sequencing data from 697 unrelated individuals and their simulated disease phenotypes from Genetic Analysis Workshop 17, we develop and apply a gene-based method to identify the relationship between a gene with multiple rare genetic variants and a phenotype. The method is based on the Mantel test, which assesses the correlation between two distance matrices using a permutation procedure. Using up to 100,000 permutations to estimate the statistical significance in 200 replicate data sets, we found that the method had 5.1% type I error at an α level of 0.05 and had various power to detect genes with simulated genetic associations. FLT1 and KDR had the most significant correlations with Q1 and were replicated 170 and 24 times, respectively, in 200 simulated data sets using a Bonferroni corrected p-value of 0.05 as a threshold. These results suggest that the distance correlation method can be used to identify genotype-phenotype association when multiple rare genetic variants in a gene are involved

    Visual Analytics for Epidemiologists: Understanding the Interactions Between Age, Time, and Disease with Multi-Panel Graphs

    Get PDF
    Visual analytics, a technique aiding data analysis and decision making, is a novel tool that allows for a better understanding of the context of complex systems. Public health professionals can greatly benefit from this technique since context is integral in disease monitoring and biosurveillance. We propose a graphical tool that can reveal the distribution of an outcome by time and age simultaneously.We introduce and demonstrate multi-panel (MP) graphs applied in four different settings: U.S. national influenza-associated and salmonellosis-associated hospitalizations among the older adult population (≥65 years old), 1991-2004; confirmed salmonellosis cases reported to the Massachusetts Department of Public Health for the general population, 2004-2005; and asthma-associated hospital visits for children aged 0-18 at Milwaukee Children's Hospital of Wisconsin, 1997-2006. We illustrate trends and anomalies that otherwise would be obscured by traditional visualization techniques such as case pyramids and time-series plots.MP graphs can weave together two vital dynamics--temporality and demographics--that play important roles in the distribution and spread of diseases, making these graphs a powerful tool for public health and disease biosurveillance efforts

    HeLa cells incubated with styryl compound H8

    Full text link
    This is part of a larger collection of archives of 1344 styryl compounds containing digital microscopic images of HeLa cells incubated with styryl compounds, along with other information about those compounds. ** The compounds were synthesized in a combinatorial synthesis involving 8 pyridinium/quinolinium groups (A-H) and 168 aldehyde groups (1-168). The compounds are identified by their A-H letter followed by their 1-168 number. ** Each of the 12 images in this archive is a 512x512 pixel image stored as a bzip2 compressed sequence of 2 byte unsigned short integers (little endian format), in standard raster order. ** Fluorescence in the Hoechst channel derives from Hoechst dye. Fluorescence in the other channels is presumed to derive from the styryl molecules, although cellular autofluorescence is also present. ** The images were acquired at 200X magnification with a 20X objective. The images were acquired with a 12 bit CCD camera so the greyscale intensities range from 0 to 4095. ** The "withdye" images were acquired after around 1 hour of incubation with the styryl probe. Next the probe was washed out using an onboard robotic pipetting system, and the "washout" images were acquired. ** The Hoechst dye images were acquired at 50msec exposure time, the other channels were acquired at both a 1sec and 200msec exposure time (Cy5 was only acquired at 1sec). ** The images were obtained using a Cellomics KineticScan instrument. Cells were kept alive and healthy in an onboard environmental control chamber. ** Styryl compounds were present at 100 micromolar concentration.http://deepblue.lib.umich.edu/bitstream/2027.42/59962/1/H8.ta
    • …
    corecore